Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features

نویسندگان

  • Viola Ganter
  • Michael Strube
چکیده

We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weasels, Hedges and Peacocks: Discourse-level Uncertainty in Wikipedia Articles

Uncertainty is an important linguistic phenomenon that is relevant in many areas of language processing. While earlier research mostly concentrated on the semantic aspects of uncertainty, here we focus on discourseand pragmaticsrelated aspects of uncertainty. We present a classification of such linguistic phenomena and introduce a corpus of Wikipedia articles in which the presented types of dis...

متن کامل

A Lucene and Maximum Entropy Model Based Hedge Detection System

This paper describes the approach to hedge detection we developed, in order to participate in the shared task at CoNLL 2010. A supervised learning approach is employed in our implementation. Hedge cue annotations in the training data are used as the seed to build a reliable hedge cue set. Maximum Entropy (MaxEnt) model is used as the learning technique to determine uncertainty. By making use of...

متن کامل

Uncertainty Detection as Approximate Max-Margin Sequence Labelling

This paper reports experiments for the CoNLL 2010 shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1regularised binary support vector machine, while for sentence level weasel detection in the Wi...

متن کامل

A Cascade Method for Detecting Hedges and their Scope in Natural Language Text

Detecting hedges and their scope in natural language text is very important for information inference. In this paper, we present a system based on a cascade method for the CoNLL-2010 shared task. The system composes of two components: one for detecting hedges and another one for detecting their scope. For detecting hedges, we build a cascade subsystem. Firstly, a conditional random field (CRF) ...

متن کامل

HedgeHunter: A System for Hedge Detection and Uncertainty Classification

With the dramatic growth of scientific publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classification are important components of a high precision IE system. This paper describes a two part supervised system which classifies words as hedge or nonhedged and sentences as certain or uncertain...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009